Square
=================

逐元素计算输入数据的平方。

.. math::

    \text{dst}_i = \text{src}_i \times \text{src}_i

对于输入 `src` 中的每个元素，计算其平方值。

输入：
    - **src** - 输入数据地址。
    - **length** - 计算长度（对于复数类型，指复数的个数）。
    - **core_mask** - 核掩码（仅共享存储版本需要）。

输出：
    - **dst** - 计算结果地址，其大小与 `src` 相同。

支持平台：
    ``FT78NE``
    ``MT7004``

.. note::
    - FT78NE 支持fp32, int8, int16, int32, fp64, cplx64, cplx128
    - MT7004 支持fp16, fp32, int16, int32, cplx64

**共享存储版本:**

.. c:function:: void i8_square_s(int8_t* src, int8_t* dst, int length, int core_mask)
.. c:function:: void i16_square_s(int16_t* src, int16_t* dst, int length, int core_mask)
.. c:function:: void i32_square_s(int32_t* src, int32_t* dst, int length, int core_mask)
.. c:function:: void hp_square_s(half* src, half* dst, int length, int core_mask)
.. c:function:: void fp_square_s(float* src, float* dst, int length, int core_mask)
.. c:function:: void dp_square_s(double* src, double* dst, int length, int core_mask)
.. c:function:: void c64_square_s(float* src, float* dst, int length, int core_mask)
.. c:function:: void c128_square_s(double* src, double* dst, int length, int core_mask)

**C调用示例：**

.. code-block:: c
    :linenos:
    :emphasize-lines: 10

    //FT78NE示例
    #include <stdio.h>
    #include <square.h>

    int main(int argc, char* argv[]) {
        float *src = (float *)0xA0000000;   // input在DDR空间
        float *dst = (float *)0xB0000000;   // output
        int length = 1000;
        int core_mask = 0xff;
        fp_square_s(src, dst, length, core_mask);
        return 0;
    }

**私有存储版本:**

.. c:function:: void i8_square_p(int8_t* src, int8_t* dst, int length)
.. c:function:: void i16_square_p(int16_t* src, int16_t* dst, int length)
.. c:function:: void i32_square_p(int32_t* src, int32_t* dst, int length)
.. c:function:: void hp_square_p(half* src, half* dst, int length)
.. c:function:: void fp_square_p(float* src, float* dst, int length)
.. c:function:: void dp_square_p(double* src, double* dst, int length)
.. c:function:: void c64_square_p(float* src, float* dst, int length)
.. c:function:: void c128_square_p(double* src, double* dst, int length)

**C调用示例：**

.. code-block:: c
    :linenos:
    :emphasize-lines: 9

    //FT78NE示例
    #include <stdio.h>
    #include <square.h>

    int main(int argc, char* argv[]) {
        float *src = (float *)0x10000000;   // input在L2空间
        float *dst = (float *)0x10001000;   // output
        int length = 1000;
        fp_square_p(src, dst, length);
        return 0;
    }